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Abstract 

The goal of a denoising algorithm is to recover a signal from its noise-corrupted observations. Perfect 
recovery is seldom possible and performance is measured under a given single-letter fidelity criterion. 
For discrete signals corrupted by a known discrete memoryless channel, the DUDE was recently shown 
to perform this task asymptotically optimally, without knowledge of the statistical properties of the 
source. In the present work we address the scenario where, in addition to the lack of knowledge of 
the source statistics, there is also uncertainty in the channel characteristics. We propose a family of 
discrete denoisers and establish their asymptotic optimality under a minimax performance criterion 
which we argue is appropriate for this setting. As we show elsewhere, the proposed schemes can also be 
implemented computationally efficiently. 

1 Introduction 

Discrete sources corrupted by Discrete Memoryless Channels (DMCs) are encountered naturally in many 
fields, including information theory, computer science, and biology. The reader is referred to |15| for examples, 
as well as references to some of the related literature. It was shown in |15) that optimum denoising of a 
finite-alphabet source corrupted by a known invertible^ DMC can be achieved asymptotically, in the size 
of the data, without knowledge of the source statistics. It was further shown that the scheme achieving 
this performance, the Discrete Universal DEnoiscr (DUDE), enjoys properties that are desirable from a 
computational view point. 

The assumption of a known channel in the setting of is integral to the construction of the DUDE 
algorithm. This assumption is indeed a realistic one in many practical scenarios where the noisy medium 
through which the data is acquired is well characterized statistically. Furthermore, the computational sim- 
plicity of the DUDE allows it to be used in certain cases when the statistical properties of the DMC may 
not be fully known. For example, when there is a human observer to give feedback on the quality of the 
reconstruction. In such a case, the human observer can scan through the various possible DMCs, imple- 
menting the DUDE for each DMC, and select the one which gives the best reconstruction. Such a method 
can be used to extend the scheme of |15j to the case of channel uncertainty when it is reasonable to expect 
the availability of feedback on the quality of the reconstruction. 

*Authors are with the department of electrical engineering, Stanford University, Stanford, CA 94305. Email: ggeme- 
los@stanford.edu, styrmir@stanford.edu, and tsachy@stanford.edu. 

The work of the first two authors was supported by MURI Grant DAAD-19-99-1-0215 and NSF Grants CCR-0311633 and 
CCF-0512140. The work of the third author was supported in part by NSF Grant CCR-0312839. 

^Throughout this paper, "invcrtible DMC" is one whose associated channel matrix is of full row rank. 
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Unfortunately, such feedback is not realistic in many scenarios. For example, in applications involving 
DNA data , a human observer would probably find the task of determining which two reconstructions of 
a corrupted nucleotide sequence is closer to the original quite difficult. Other examples include applications 
involving the processing of large databases of noisy images [Sj and those involving medical images |17| . 
In the latter, human feedback is often too subjective. In such cases, an automated algorithm for discrete 
image denoising which can accommodate uncertainty in the statistical characteristics of the noisy medium is 
desired. With this motivation in mind, in this paper we address the problem of denoising when, in addition 
to the lack of knowledge of the source statistics, there is also uncertainty in the channel characteristics. 

It turns out that the introduction of uncertainty in the channel characteristics into the setting of jl5) 
results in a fundamentally different problem, calling for new performance criteria and denoising schemes 
which are principally different than those of |15| . The main reason for this divergence is that in the presence 
of channel uncertainty, the distribution of the noise-corrupted signal does not uniquely determine the dis- 
tribution of the underlying clean signal, a property which is key to the DUDE of '15; and its accompanying 
performance guarantees. To illustrate this difference, consider the simple example of the Bernoulli source cor- 
rupted by a Binary Symmetric Channel (BSC). In this example, the noise-corrupted signal is also Bernoulli 
with some parameter S < 1/2. For simplicity, we will only consider two possibilities: either the clean signal 
is the "all zero" signal corrupted by a BSC with crossover probability 5 or the clean signal is Bernoulli((5) 
passed through a noise-free channel.^ It is easy to see that solely knowing that the noise-corrupted signal is 
Bernoulli ((5), there is no way to distinguish between the two possibilities above. It is therefore impossible to 
uniquely identify the distribution of the underlying source. Degenerate as this example may be, it highlights 
the following points, which are key to our present setting and its basic difference from that of |15) : 

1. Even with complete knowledge of the noise-corrupted signal statistics, Bernoulli((5) in our example, 
there is no way of inferring the distribution of the underlying source. 

2. There exists no denoising scheme that is simultaneously optimal for all, two in our example, sources 
which can give rise to the noise-corrupted signal statistics. 

3. A scheme that minimizes the worst case loss has to be randomized.'^ In the example above, the scheme 
that minimizes the worst case bit error rate is readily seen to be the one which randomizes, cquiprobably, 
between using the observed noisy symbol as the estimate of the clean symbol and estimating with the 
symbol regardless of the observation. Such a scheme would achieve a bit error rate of 5/2 under both 
possible sources discussed above. 

As is evident through this example, the key issue is that while in the setting of I15J there is a one-to- 
one correspondence between the channel output distribution and its input distribution, a channel output 
distribution can correspond to many input distributions in the presence of channel uncertainty. This point 

^Throughout this paper, Bernoulli((5) refers to a Bernoulli process with parameter S. 

^Either in "space" (i.e., true randomization) or in time (i.e., time sharing for deterministic estimates.) 
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has also been a central theme in 01 112| , where fundamental performance limits are characterized for rate 
constrained dcnoising under uncertainty in both the source and channel characteristics.^ 

Under these circumstances, given any noise-corrupted signal, a seemingly natural criterion under which 
the performance of a denoising scheme should be judged is its worst case performance under all source- 
channel pairs that can give rise to the observed noise-corrupted signal statistics. In line with this conclusion, 
as a way to evaluate the merits of a denoising scheme, we look at a scheme's worst case performance assessed 
by a third party that has complete knowledge of both the noise-corrupted signal distribution and the whole 
noise-corrupted signal realization. Under this criterion, we define the notion of "sliding window dcnoisability" 
to be the best performance attainable by a sliding window scheme of any order. This can be considered our 
setting's analogue to the "sliding window dcnoisability" of |15| (which in turn was inspired by the finite-state 
compressibility of ^H], the finite-state predictability of [7|, and the finite-state noisy predictability of |14|). 
By definition, this is a fundamental lower bound on the performance of any sliding window scheme. Our 
main contribution is the presentation of a family of sliding window dcnoisers that asymptotically attains this 
lower bound. 

The problem of denoising discrete sources corrupted by an unknown DMC has been previously considered 
in the context of state estimation in the literature on hidden Markov models (cf. j^I and the many references 
therein). In that setting, one assumes the source to be a Markov process. The EM algorithm of |21 is then 
used to obtain the maximum likelihood estimates of the process and channel parameters. One then denoises 
optimally assuming the estimated values of the source and channel parameters. This approach is widely 
employed in practice and has been quite successful in a variety of applications. Other than the hidden 
Markov model method, the only other general approach we are aware of for discrete denoising under channel 
uncertainty is the DUDE with "feedback" discussed above. For the special case of binary signals corrupted 
by a BSC, an additional scheme was suggested in |15l Subsection 8-C] which makes use of a particular 
estimate of the channel crossover probability. 

These existing schemes lack solid theoretical performance guarantees. Insofar as the hidden Markov model 
based schemes go, performance guarantees are available only for the case where the underlying source is a 
Markov process. Furthermore, these performance guarantees stipulate "identifiability" conditions (cf. [Tllll| 
and references therein), which do not hold in our setting of channel uncertainty. The more recent approach of 
employing the DUDE tailored to an estimate of the channel characteristics is shown in |5] to be suboptimal 
with respect to the worst case performance criterion we propose. This suggests that the schemes we introduce 
in this work arc of an essentially different nature than the DUDE |15| . 

After we state the problem in Section |2 we turn to describe our denoiser in Section |3| In Section 0] 
we concretely introduce the performance measure and performance benchmarks that were qualitatively 
described above for the case where there are a finite number of possible channels. In Sectional we state 
our main results, which assess the performance of the denoisers of Section |31 and guarantee their universal 

*In that line of work, Shannon theoretic aspects of the problem are considered and attention is restricted to memory less 
sources. Our current framework considers noise-free sources that are arbitrarily distributed. 
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asymptotic optimality under the performance criteria of Sectional To focus on the essentials of the problem, 
we assume in Section |31 that the channel imcertainty set is finite. In Section |S1 we extend the performance 
measure of Section 0] and the guarantees of Section [S] to the case of an infinite number of possible channels. 
The proof of the results are left to the appendix. 

2 Problem Statement 

Before formally stating the problem, we introduce some notation: An upper case letter will denote ran- 
dom quantities while the corresponding lower case letter will denote individual realizations. Bold notation 
will be used to represent doubly infinite sequences. For example, X will denote the stochastic process 
{. . . , X-i, Xq, Xi, . . .} and x = {. . . , x-i, xq, xi, . . .} a particular realization. Furthermore, for indices 
* < j, the vector (Xi, . . . ,Xj) will be denoted by X^ . We will omit the subscript when t = 1. 

Using the above notation, the problem statement is as follows: Let A be a collection of invertible DMCs. 
A source X is passed through an unknown DMC in A and we denote the output process as Z. The process 
Z is thus a noise-corrupted version of the X process. We assume that the components of both X and Z take 
on values in a finite alphabet denoted by A. Given and A, we wish to reconstruct X" under a given 
single letter loss function, A : ^ x ^ i-^ M+. For a,b E A, A{a,b) can be interpreted as the loss incurred 
when reconstructing the symbol a with the symbol b. Here we make the assumption that the components of 
the reconstruction also lie in the finite alphabet A. Given x" , i" G A" , we denote 



n 

A(a;",x") = -V A(x„ 

n ^ — ^ 



n 

4=1 



3 Description of the Algorithm 

Inherent in the setup of our problem is the uncertainty regarding which channel corrupted the clean source, 
as depicted in Figure^ We are given that the channel lies in an uncertainty set A, and the uncertainty set 
is assumed to be fixed and known to the denoiser. The description of the denoiser is broken into two parts. 
In Section 130 we present an overview of the development of the denoiser, while a detailed construction of 
the denoiser is presented in Section Idbl 



X'' 



Unknown DMC 
in A 



Denoiser ^X^' 



Figure 1: A noiseless source AT", corrupted by a channel known to lie in an uncertainty set A, and we observe 
the output Z". 



A Outline of Algorithm 

For simplicity, we start by limiting A to be a finite collection of invertible DMCs. The case of |A| being 
infinite requires a more technical analysis which will be discussed in Sectional Throughout the paper, we 
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confine our discussion to sliding window denoisers. A sliding window denoiser of order k works as follows: 
When denoising a particular symbol, it considers the k symbols preceding it and the k symbols succeeding 
it. These k symbols before and after the current symbol form a two-sided context of the current symbol. In 
particular, if we denote the current symbol by zq, the two-sided context is zZ\ and z^. In addition to the 
usual deterministic denoisers, we allow randomized denoisers. A randomized denoiser is a denoiser whose 
output is a distribution from which a reconstruction must be drawn as a final step. Therefore, we can think 
of a sliding window denoiser, both deterministic and random, as a mapping from A^^'^^ \—f S{A). Here, for a 
given alphabet A., S{A) is used to denote the |^| -dimensional probability simplex^. If / is a sliding window 
denoiser, we denote its simplex-valued output by /([z~^, zqi -Zi]) or /(z*ij.). We can use a fc-order sliding 
window denoiser / to denoise Z", by drawing the i-th reconstruction according to the distribution /(Z*^^). 

Let n be some channel in A, Pz'' ^ the probability distribution on Z^^,, and / a sliding window denoiser.^ 
We now assume there exists a function Gk that, when given 11, Pz'' ^-^id /, evaluates the performance of 
the denoiser / on that particular 11 and P^*" ^ ■ Here performance is measured by the expected loss, under A, 
incurred when estimating Xq based on f{Z^f?). This is denoted by Gk (j^z'' ^ ' • ^^^^ next subsection, 
we explicitly derive this function. 

The main idea behind our construction is to look at the worst case performance of a particular denoiser 
/ over all the channels in the uncertainty set A. Since Gk gives the performance of / for a given channel H, 
we can take the maximum over all the channels in A. Define 

J,(p^.^,A,/)=maxG,(p^.^,H,/). 

By definition, Jk is the worst case performance of denoiser / over all the channels in A. Let Tk denote the 
set of all fc-order sliding window denoisers. We now define the min-max denoiser, 

fmUk [Pz" , ' ^] = ai-g mill Jk (Pz" , , A, / ) . (1) 

By construction, /mm^ minimizes the worst expected loss over all channels in A. Unfortunately, employment 
of this scheme requires knowledge of the noise-corrupted source distribution Pz^ which is not given in 
this setting. Our approach is to employ fmmk using an estimate of . In particular, letting Q'^^^^\z'^\ 
denote the (2fc -I- l)-order empirical distribution induced by z", we look at the n-block denoiser defined by 

Up to now in the development of our denoiser, the uncertainty set A remained unchanged. However, it 
is reasonable to assume that knowledge gained from our observations of the output processes Z can be used 
to modify the uncertainty set. In order to make this intuition more rigorous, we make use of the following 
definition. Given an observed output distribution P^t , a channel H is said to be k-feasible if there exists a 

^Similarly, wc will use cS'°(,4) to denote the simplex on fc-tuples on the alphabet A.. Also, cS°°(,4) will denote the set of all 
distribution on doubly infinite sequences that take value in A. If no alphabet is given, the alphabet A is assumed. 

^Throughout, given a random variable X, Px will be used to represent the associated probability law. Similar notation will 
also be used for vectors of random variables, such as Pj^k to denote the probability law associated with the vector X^^. This 

will hold even for doubly infinite vectors like Z. 
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valid fc-ordcr distribution Pxk such that J\*Pxk = Pz^ As an example, we can look at a Bernoulli(p) source 
corrupted by a binary symmetric channel with unknown crossover probability (5, and assume p, 5 < 1/2. 
In this case, the output process will also be a Bernoulli source with parameter q = p{l — 5) + (1 — p)6. 
Then it is clear that for any k, no binary symmetric channel with crossover probability greater than q is 
fc-feasible. Similarly, all binary symmetric channels with crossover probability less than q are fc-feasible for 
all k. We shall say that a channel 11 is feasible with respect to the noise-corrupted source distribution Pz if 
n is fc-feasible with respect to Pzk for all k. 

Using this concept of feasibility, given P^k ^ , define 

i^z'-k) = {n e CiA) : 3Pxk^ e 3^"+' s.t. n * P^k^ = p^. ^ } , (2) 

where C{A) is the set of all invertible channels whose input and output take values in the alphabet A. Recall 
that 5^'"'+^ denotes the probability simplex on {2k + l)-tuples in A. Therefore, Ck{Pzk ^) is simply the set of 
all (2fc + l)-feasible channels with respect to the output distribution Pz^ ^ - With a slight abuse of notation, 
we will also use Ck{Pz) to represent Ck{Pzk^)- Furthermore, we will use Coo{Pz) to denote the set of feasible 
channels, i.e. those channels which are in Ck{Pz) for all k. 

With our Bernoulli example in mind, we see that it need not be the case that given P^k ^ , all the channels 
in A are (2fc + l)-feasible. Hence we can rule out all channels in our uncertainty set A which are found not 
to be {2k + l)-feasible with respect to the observed output distribution. In other words, we can trim the 
uncertainty set down from A to A n Cfc {Pzk ^ ) . This added information motivates the construction of our 
denoiser: We now define the n-block denoiser using the function /mm^ from by letting its estimate of Xi 
be 



Xi ~ ft/ 



\2k+l r 



z"],AnC,(Q2'+i[^«])] {,^^), (3) 



Note that this denoiser depends on parameters fc, I and the a-priori uncertainty set A. We denote this n-block 
denoiser by X^'*^''- For the special case where we know A C Coo{Pz), let X2!^ denote the denoiser defined 



by 

Xi ~ /ly 

B Construction of Denoiser 



Qifc+l[^n]^^ (Z]+^). (4) 



We now give a more detailed account of the construction of X^''^'' and X^^, and elaborate on technical 
details that arise in their derivation. Assume we are given a channel 11 G A, a {2k + l)-order output 
distribution P^k , and a sliding window denoiser /. For a fixed two-sided context Zzl ~ zZ\. and Z\ = , 



-1 ,k- 



Pyk induces a conditional distribution on Zn, denoted by Pr, .^-i -i r,u k or, in short, , 

We now wish to derive a function Fk{Pz„\z-^, z^^n, /) which gives the expected loss, with respect to 
A, incurred when we estimate X^ with the denoiser f{Z'^f,) given that Zzl = zZl and Z't^ = z'ti- Note 
that when P^t is a channel output distribution and there exists an input distribution P^k such that 



'^Throughout, given a distribution on fc-tuples Pxk, n * Pxk will denote the fc-tupic distribution of the output of a DMC 
whose transition matrix is n and input has the fc-order distribution Pxk ■ 
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n * Px'' 1^ = ^z'' '^^ ^^^^ *° show that P^^ 



(cf., e.g., ^1 Section 3]). Therefore, 



the expected loss calculated by the function can be viewed as a twofold expectation, with respect to 



Py 1-1 fc, and the denoiser. We can therefore write out as: 



F, P, 



Zo\zZ 



1 z'' 



xeA.zeA 



-T 



P. 



Zo|2 



n(x,z) 



26^ 



(5) 
(6) 



where: 



• Given a channel 11 and x,z ^ A, n(x, z) denotes the probability the channel output is z given the 
input is X. With a slight abuse of notation, 11 without an argument will denote the channel transition 
matrix. Similarly, A without an argument will be used to denote the |^| x \A\ matrix whose {x, z)-th 
entry is given by K{x,z). 



n-' P. 



Zo\zZi 



denotes the x-th component of the column vector 11 '^P, 



Zo\z 



• f{[z_l,a,Zi])[z] is the z-th element of the |y^|-dimensional simplex member a, z^]), and/([z_^,, • ,zf])[z] 
is the column vector whose a-th component is /([z^^, a, Recall that a denoiser / is a mapping 

• 1 denotes the "all ones" |^| -dimensional column vector. 

• Denotes the Hadamard product, that is the component-wise multiplication. 

• TTr Denotes the |^| -dimensional column vector whose a-th component is n(a, z). 

Equipped with the function Fk, we can now construct Gk- Recall for a given channel 11, a {2k + l)-order 
output distribution P^k , and a denoiser /, Gk calculates the expected loss with respect to A. Hence F^ can 
be thought of as Gk conditioned on a particular context zZl and z^ . It follows that 

(7) 



Gk{Pz^_^,uj) = E (^z„|.-,.f.n,/) p^.^ {ZZl = -^-Izl = z',} , 

zzi,z'^eA'' 

where Pz'' ^ {^-1 = -^-fe'-^i = -^i } ^^e probability under the law Pz'' ^ that Zzl = zZ\ and = zf. 
Substituting ((S} in Q and simplifying gives 



Gk{Pz^,nj) 



E El" 

:l.z>ieA'' zeA 



n^^^z.i--! z" 



© TT, [A • f{[zZl ■ , Z^])[Z]] J Pz.^ {zzl = ^1 = 4} 

(8) 



Following the development in Section QEI we now use Gk in the construction of Jk 



Jk { Pz^ , A, / ) = max Gk { Pz^ ,^ , H, / 
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We make the following two observations. The function Jk{Pzi' ^, f) is continuous in /, i.e. continuous 
in the space of all (2k + l)-order sliding window denoisers. This is an easily verified consequence of the 
definition of Gk- The second observation requires the construction of a metric, p, between sets of channels. 
Recall that C{A) denotes the set of invertible channels whose input and output take values in the alphabet 
A. For nonempty A, B C C{A) we define 

p{A, B) — sup inf ||a — 6|| + sup inf ||a — 

where || • || denotes the L°° norm. With respect to the metric p, Jk{Pz'' f) is uniformly continuous in 
A. More specifically, for all A' C A, 



Jk [Pz.^,Ajj - Jk (^P^.^, A',/j < </)fc(p(A, A')), (9) 
for some (f>k, independent of P^k and /, such that (j)k{£) j as e J, 0. For example, 

Me) = 



|^P'=+iA„,axinax||n-i| 

is readily verified to satisfy 

Continuing the development, as per our previous definition, 



(10) 



/MMfc [Pz" , 7 ^] = arg min Jk [Pzk , A, / 



selecting an arbitrary achiever when it is not unique. Note that the minimum is achieved since, as observed, 
Jk is continuous in / and the space of all (2fc + l)-order sliding window denoisers is compact. Equation Q 
and |@J then complete the construction of the denoisers. 

C Binary Alphabet 

Before moving on, it may be illustrative to explore the form of X^^'^ for the binary case. In particular, 
we will look at the case of denoising a binary signal corrupted by an unknown Binary Symmetric Channel 
(BSC) with respect to the Hamming loss. We suppose it is known that the BSC lies in some finite set A. 
We will assume that all the channels in A have a crossover probability less than 1/2. 

The first step in constructing our binary dcnoiser is finding the binary version of Fk- Let us fix a particular 
context zZ]. and zf. As we recall from Fk is a bmction of a distribution Pzg\z-\ a channel 11, and a 
dcnoiser /. In the binary case, P^^^i^-i is completely specified by the conditional probability that Zq = 1. 
We will denote this probability as a(zZfe7^i )- The channel is a BSC and therefore defined by its crossover 
probability, denoted hy 6 < 1/2. Also recall that a denoiser / is a mapping from {0, 1} ^ S{{Q, 1}). Hence 
for our two-sided context, / can be completely defined by the probability assigned to 1 given Zq = 1, denoted 
by di{zZl., Zi), and the probability assigned to 1 given Za = 0, denoted by da{zZl, z^). Finally, recall that 
Fk measures the expected loss, here with respect to the Hamming loss, incurred when we estimate Xq with 
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f{Z^j^) given that = z_], and = . With this in mind, wc write out for the binary case as 

''I —k' 1 

=A(0, 0) [Pr{Xo = 0, Zo = + Pr{Xo = 0, Zo = 0}Jo] 
+ A(0, 1) [Pr{Xo = 0, = + Pr{Xo = 0, Zq = 0}do] 
+ A(l, 0) [Pr{Xo = 1, Zo = l}di + Pr{Xo = 1, = 0}do] 
+ A(l, 1) [Pr{Xo = 1, Zo = l}di + Pr{Xo = 1, = l}do] 

= [Pr{Xo = 0, Zo = l}di + Pr{Xo = 0, Zq = 0}do] 

+ [Pr{Xo = 1, Zo = l}di + Pr{Xo = 1, Zq = 0}do] 
5(1 -a- 5) ^ 5(1- a- 5) ' 
1-26 + 1-2^ ^° 

1-2(5 1-2(5 " 

(5(1 - a - (5)(ii + (5(1 - CK - (5)do + 6{a - S)di + 5{a - S)do 



11) 



1-2(5 

where we dropped (a?(z~^., ), c?i(z3^,z^), and do{zZl., Zi) dependence of zZl- and z^ for notational com- 
pactness. Using Hll|l . we can then foUow the construction in Section KIBI to derive the binary version of the 
denoiser X^ '' '- The practical implementation of this denoiser is discussed in detail in |H]. 

4 Performance Criterion 

In the setting of |15| . the known channel setting, performance is measured by expected loss and optimal 
performance is characterized via the Bayes Envelope. In that setting, with the expected loss performance 
measure, a denoiser which achieves the Bayes Envelope is optimal. However, as the following example 
illustrates, this performance measure and guarantee arc not relevant for the unknown channel setting. 

Example 1 Let Z 6e a binary source, X, corrupted by a BSC with unknown crossover probability (5 £ A = 
{.1, .2}. Furthermore, Z is known to be a Bernoulli process with parameter 1/4. Therefore, we know that X 
is also a Bernoulli process with parameter a <\/A. We want to reconstruct X" from Z" with respect to the 
Hamming loss function. Let us examine the two possible cases: 

L The channel crossover probability 5, is A. Since the Bernoulli process Z has parameter 1/4, we determine 
that a = .1875. Since a > S, it is readily seen that in order to minimize loss, we should reconstruct Xi 
with the observation Zi. This scheme achieves the Bayes Envelope for a BSC with S = .1. 

2. The channel crossover probability 6, is .2. Since the Bernoulli process Z has parameter 1/4, we determine 
that a ~ .0833. Since a < 6, it is readily seen that in order to minimize expected loss, we should 
reconstruct Xi with regardless of the observed Zi. The optimality of this reconstruction scheme stems 
from the fact that when a < S, an observed 1 in the channel output is more likely to be caused by the 



9 



BSC than the source. Similarly to our previous case, this scheme achieves the Bayes Envelope for a 
BSC with 5 = .2. 

We also observe that the optimal scheme for one case is suboptimal for the other. 

From Example ^ we see that although one can achieve the Bayes Envelope for each channel in the 
uncertainty set, there may not be one denoiser that can achieve the Bayes Envelope for each channel simul- 
taneously. In particular, there does not exist a denoiser which is simultaneously optimal for the two possible 
channels in Example ^ It is therefore problematic to compare various denoisers in the unknown channel 
setting using expected loss as a performance measure. How would one rank the two denoising schemes sug- 
gested in Example Each scheme is optimal for one of the two possible channels, but suboptimal for the 
other. This difficulty also leads to an ambiguity in defining an optimal denoiser. 

Clearly, a new performance measure is needed for our setting of the unknown channel. Without any 
prior on the uncertainty set, a natural performance measure which is applicable in this setting is a min-max, 
or worst case measure. In other words, we look at the worst case expected loss of a denoiser across all 
possible channels in the uncertainty set A. Such a performance measure would take into account the entire 
uncertainty set. With this is mind, we can define our performance measure. Before doing so we need to 
introduce some notation. For x", e A" , given a fc-order sliding window denoiser / we denote 

n — k 

Lf{x\z") = - ^ ^ A (x„ a) / (zl^t) [a], (12) 

t=k+l aeA 

the normalized loss® when employing the sliding window denoiser /. Here we make the assumption that 
k < n. Furthermore, given a channel H and a source distribution Px, -P[Px.n] will denote the joint distribution 
on (X, Z) when X ~ Px and Z is the output of the channel H with input X. Given an uncertainty set A, 
we now define our performance measure as follows: 

4"^(Pz,A,Z)= sup E[p^,n][Lf{X",Z")\Z], (13) 

{(Px,n):neA,n*Px=Pz} 

where _E[p^ n] [ ' |Z] denotes the conditional expectation, with respect to the joint distribution P[p-^ , given 
Z. In words, for a given denoiser /, an imcertainty set A, and the noise-corrupted source Z, £y(Pz, A, Z) 
is the worst case expected loss of the denoiser / over all feasible channels in the uncertainty set A, given Z. 
The performance measure in (|13() is conditioned on the noise-corrupted sequence Z since it seems natural 
that the performance of a denoiser be determined on the basis of the actual source realization, rather than 
merely on its distribution. Although the performance measure is defined using this conditioning, in Sections^ 
andEl performance guarantees are given for both the conditional performance measure and a non-conditional 
version. 

Equipped with our new performance measure, we can now compare the two denoising schemes suggested 
in Example n Let /i and /2 denote the denoising scheme of Case 1 and Case 2, respectively, i.e. /i is the 

*Up to the "edge-efFccts" associated with indices t outside the range k + l<t<n — k that will be asymptotically 
inconsequential in our analysis (which will assume fc <K n). 
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"say what you see" scheme and /2 is the "say aU zeros" scheme. Furthermore, given the Bernoulh process 
Z, let iVi(Z") be the frequency of ones in Z". We see that, for any 71, 



4"^(Pz,{.l,.2},Z) = max E 

<56{.1,.2} 



- J2 ^x.^z, 

n ^-^ 

i=l 

1 " 

nax - Vi;[lx,#z. I Z] 

{.1,.2} n ^ ^^ ^ \ 1 

1 " 

nax -J" E[lx,^z,\Z,] 
{.i,.2} n ^ 



max 

5e 



max 
ie{.i,.L, . 

2—1 

max iVi(Z") Pr{Xo ^ ZqI^o = 1} + (1 - iVi(Z")) Pr{Xo + Z^\Z^ = 0} 
<5e{.i,.2} 

.e1^f2}^^^^ ) Pr{Z, = l} » Pr{Z. = 0} 

.Pr{Xo^OKl,:^Pr{Xo^l}V 



5G{.1,.2} VPr{Z. = 1} ^ " ' Pr{Z, = 0} 
and that 

4"^(Fz,{.l,.2},Z) = ^max^ 7Vi(Z") Pr{Xo = l|Zo = 1} + (1 - iVi(Z")) Pr{Xo = l|Zo = 0} 



The strong law of large numbers states that as n 00, iVi(Z") converges to Pr{Zo = 1} w.p. 1. Therefore, 
for large n 

4^)(Fz,{.l,.2},Z) « .2 
4"^(^'z,{.l,.2},Z) « .1875 

with high probability. Can wc find a dcnoiscr that docs better than the two suggested in Example 

One possible way to improve denoiser performance in Example ^ is to time share between the two 
suggested dcnoisers schemes, "say what you sec" and "say all zeros." For 7 e [0, 1], let f^'''' be a denoiser 
which at each reconstruction implements "say what you sec" with probability 7 and "say all zeros" with 
probability 1 — 7. To simplify our calculations, wc will assume that n is large enough such that iVi(Z") is 
close to Pr{Zo = 1} with high probability. We can now calculate the performance of this denoiser as follows: 

4w(^z,{.l,.2},Z) « max 7Pr{X,^Za + (l-7)Pr{A^ = 0} 
J Se{.i...2} 

= max{.l7 + .1875(1 - 7), .27 + .0833(1 - 7)} 
= max {.08757 + .1875, .II687 + .0833} , 

with high probability. Wc can then find the best such denoiser by finding the 7 which minimizes the worst 
case loss. It is easily seen that, with high probability, 

min (Pz,{.l, .2}, Z) « min max {.08757 + .1875, .II687 + .0833} 

7e[o,i] J 7e[o,i] 

= .1428, 
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and that the minimum is achieved by 7 = .5101. 

We see then that, for typical^ z, j(-5ioi) is a better denoiser than /i and /2, but what is the best denoiser? 
To answer this question, we develop the concept of an optimal denoiser under the worst case loss performance 
measure defined in (|13|) . First, recall that Tk denotes the set of all fc-order sliding window dcnoisers. Now 
define 

Mi"^(Pz, A,Z) = min 4")(Pz, A,Z), (14) 
Aifc(Pz,A,Z) =limsup4"^(Pz,A,Z). (15) 

n — ^00 

In words, n^i^\Pz, A, Z) is the performance of the best fc-order sliding window denoiser operating on blocks 
of size We then take ri — s- 00 to define Hk{Pzi A, Z), the performance of the best fc-ordcr sliding window 
denoiser. Finally we let fc — > 00 and define the "sliding window minimum loss," 

/^(Pz, A, Z) lim Mfc(^z, A, Z), (16) 

k — ^00 

where the limit is actually an infimum since for every Z, fikiPzi A, Z) is point wise non-increasing with k. In 
words, /i(Pzi A, Z) is the performance of the best sliding window denoiser of any order. Hence fJ.{Pz, A, Z) 
is a bound on the performance of any sliding window denoiser. We denote a denoiser as optimal if it 
achieves this performance bound Pz-a.s., the need for an almost sure statement comes from the fact that 
both the performance bound and measure depend on the source realization. Surprisingly, it can be shown 
that the denoiser defined above is optimal for the Example^ i.e., with high probability comes close 

to attaining the minimum in (|14|l for all k. This is due to the memorylessness of the source in Example ^ 

One can consider /i(Pz, A, Z) defined in (|14|l as a kind of analogue in our setup to the "sliding window 
minimum loss" of 1151 Section 5] which, in turn, is analogous to the finite-state compressibility of |18| . the 
finite-state predictability of [71, and the conditional finite-state predictability of |14| . 

5 Performance Guarantees 

In this section we present a result on the performance of the algorithm presented in Section |31 with respect 
to the performance measure discussed in the previous section. 

Throughout this section the uncertainty set A is assumed to be finite. Additionally, to isolate the main 
issue of minimizing the worst case performance from the issue of estimating the set of channels in the 
uncertainty set, we limit our first theorem to the case where all channels in the uncertainty set arc known 
to be feasible, namely they satisfy A C Coo{Pz)- 

In particular, all z with limn— *oo Ni{z") = 1/4. 
^''Although A*^"' is defined as a minimum over an uncountable set, it is easily seen to be point-wise equal to 
minj._^2fc+i^5^j (Pz, Z), where we use Sq to denote the subset of S consisting of distributions with rational components. 
The latter is a minimum over a countable set of random variables and hence measurable. 
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Theorem 1 Let 



where on the right side is the n-block denoiser defined in ^ and let {fc„} be any sequence satisfying fc„ < 
le'i'n"^! • ^'^^ output distribution Pz such that A C Cao{Pz), 

£^„ (Pz,A,Z)-/^^")(Fz,A,Z)l=0 Pz-a.s. (17) 



lim 



Wc defer the proof of Theorem ^ to the appendix. 

Remarks: Note that beyond the stipulation A C Coo(-Pz), no other assumption is made on Pz, not even 
stationarity. Note also that, as a direct consequence of H14|l . we have for each n and all possible realizations 
of Z, 

£^„_(Pz,A,Z)>Mi"j(Pz,A,Z). 
Thus, the non-trivial part of H17|l is that 

limsup(£;e" . (^'z,A,Z)-Mi"^(Pz,A,Z)) <0 Pz - a.s. 
An immediate consequence of Theorem ^ is: 

Corollary 1 Let the setting of Theorem^hold and kn — > oo. For any Pz such that A C Coo(Pz) 

limsup/:^„ (Pz,A,Z)<^(Pz,A,Z) Pz - a.s. (18) 

Proof: We have Pz-a.s., 

limsup£^„ (Pz,A,Z) =limsup/i^"^(Pz,A,Z) 



n — >OD 



<M^z,A,Z), (19) 

where the equality follows from Theorem ^ The inequality comes from the fact that for any fixed fc, since 
kn increases without bound, 

lim sup Mi'^^ (Pz , A, Z) < //fe (Pz , A, Z). 

n — ^oo 

Therefore the left side is also upper bounded by inffc>i fJ,k{Pz, A, Z) = /i(Pz, A, Z). 

□ 

Corollary n states that asymptotically, in n and the window size, the sliding window denoiser of Sec- 
tion |31 achieves the performance bound fJ.{Pz, A, Z) Pz-a.s. The dcnoising scheme is therefore asymptotically 
optimal with respect to the worst case performance measure described in Sectional 

We also establish the following consequence of Theorem ^ 

Corollary 2 Let Pz be stationary and ergodic, A be finite, and ^^„i„ be defined as in Theorem Q with 
kn = k. If AC Coo(Pz), then 



lim 



max -E^IPx.ni 
{(Px,n):neA,n*Px=Pz} ^ ' 



(X",Z") - min max E,p^ n]\Lf{X'', Z"")] 

J /G.Ffc{(Px,n):nGA,n*Px=Pz} ' ■'^ 



0. 
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For proof of Corollary |21 see the appendix. 

Note that the difference between the kind of statement in Theorem Q] and that in Corollary 12 is that 
in the latter we omit the conditioning on the noise-corrupted sequence Z. The latter can be viewed as the 
analogue of our setting to the expectation results of |15) , while the statement of Theorem ^ is more in the 
spirit of the semi-stochastic setting of |15| . 

6 Performance Guarantees For the General Case 

In Section[Sl we assumed that |A| was finite and that all channels in A are feasible. These two assumptions 
allowed us to avoid a few technicalities. In this section, we will remove these assumptions and extend the 
performance guarantees of Section to the case where A is an infinite set, and wc no longer require that 
A C Coo{Pz)- To preserve the concept of invertibility, wc require that maxneA ||n^^|| be finite. 

Before continuing, it is important to identify the issues that arise when we remove these two key assump- 
tions. In H13|) . our performance measure C is defined to be the supremum of E^p^ ji] [L f {X^\ Z^'')\Z] over 
the set of feasible channels in A. Although E^p^ ji] [Lf{X^\ Z")|Z] is a measurable function for each 11 e A, 
if A is an uncountable set, we arc no longer assured that the supremum in (|13|l is measurable. Initially, to 
avoid this complication we made the assumption of |A| being finite. 

To deal with this measurability issue in the development of C, one may consider those channels in A which 
have rational transition matrices. Let Q{A) be the subset of channels in C{A) whose transition matrices have 
rational components. Then given an uncountable uncertainty set A, we can look at c'-p\Pz,AnQiA),Z). 
Since A H Q{A.) is a countable set, we are assured that cf\Pz,An Q{A),Z) is wch defined. Using this 
modification, we can extend the definition of fi^ and fi. Similarly, wc can use this approach in the construction 
of our denoiser X^''^''- We therefore assume that A C Q{A). 

The other assumption made in Section is that all channels in A are feasible. We can remove this 
condition if A is sufficiently well behaved in the following sense: 

Assumption 1 Given a set A, let denote its closure. For every stationary process U, 

oo _ / ^ \ ~ 

f][AnCi{Pu.j) =[C]AnCi{Pu.j\ 
1=1 \i=i 1 

and 

p(AnCoo(u),AnCz(U)) 

is continuous in U for all I . 

Assumption 2 For each I there exists a function hi satisfying 6;(e) ], as e [ Q and 

p[AnCi{Pu.J,AnCi{Pij,j) <&;(i|Pf/i, -P^.JI). (20) 

Assumption ^ imposes a structural constraint on A while Assumption [5] gives us a form of continuity. To 
illustrate these two assumptions, let us explore the binary case. Let A consist of all BSCs with rational 
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crossover probability less than some Sq < 1/2. It is easy to see that any such A satisfies Assumption ^ 
Furthermore, Assumption |21 is satisfied with bi{e) ~ Jj~^s~)^' ^^'^^^^ generally, if A consists of all channels 
in Q{A) within a certain radius of the noise free channel, then 

&z(e)=£(max||n"i||)l-^l' (21) 

satisfies Assumption |21 

Before we state the next performance guarantee, we need to introduce the notion of ip-mbdng. Roughly, 
the i-th T/j-mixing coefficient of a stationary source Pz is defined as the maximum value of the distance 
between the value 1 and the Radon-Nikodym derivative between P^o z°° ^^id the product distribution 
P^o X Pzf° (cf. |3| for a rigorous definition). In our finite- alphabet setting, the i-th. -^-mixing coefficient 
associated with a given stationary source Pz is more simply given by 



sup max 



1 



Qualitatively, the ^/i-mixing coefiicients are a measure of the effective memory of a process. For a given 
sequence of nonnegative reals {tpi} we let denote all stationary sources whose i-th -^-mixing coefficient 

is bounded above by ipt for all i. 

Theorem 2 Let {ipi} be a sequence of nonnegative reals with ijji — > and let A C Q{A) satisfy Assump- 
tions^and\^ There exists an unbounded sequences {In} and {fc„} such that if 

'^univ A ' 

then for any Pz G Ss^^.^^ and any sequence {A„} with A„ C A and |A„| = O 

limsup[£jf„ (Pz,A„,Z)-/i^")(Pz,A,Z)] <0 Pz - a.s. (22) 

The proof of Theorem |21 makes use of a more general result, Lemma[3 Lemma|Z|and the proof of Theorem|21 

can be found in the appendix. 

Remarks: 

• The explicit dependence of {In} and {kn} on {?/'«} is given in the proof. 

• li tpi = e^*'' for some p > then any w„ = o(logrt) will do. 

• Any Markov source of any order with no restricted transitions, as well as any finite-state hidden 
Markov process whose underlying state sequence has no restricted transitions is exponentially mixing, 
i.e., belongs to with ipi — e^*'' for some p > (cf. |S]). 

Analogously as was done in Corollary[21 we can extend the results of Thcorcm|21as follows: 
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Proposition 1 Let {"^j} be a sequence of nonnegative reals with ipi — > and assume finite A. There exists 
unbounded sequences {/„} and {fc„} such that if 



-y-n Y-n.kn.ln 

univ A 



then for any Pz G 



lim 

n — ^oo 



max ^[Px,ni 
{(Px,n):neA,Px*n=Pz} ^ ' 



- min max E,p^ n][LfiX'' , Z")] 

J /e^fc„ {(Px,n):nGA,Px*n=Pz} i^' -^^ 



(23) 

We defer the proof of Proposition ^ to the appendix. 

As in Corollary |21 Proposition ^ gives a performance guarantee under the strict expectation criterion, 
i.e., when the maximization is over expectations rather than conditional expectations. It implies that under 
benign assumptions on the process, optimality with respect to the latter suffices for optimality with respect 
to the former. 

7 Conclusion 

In the discrete denoising problem, it is not always realistic to assume full knowledge of the channel charac- 
teristics. In this paper, we have presented a denoising scheme designed to operate in the presence of such 
channel uncertainty. We have proposed a worst case performance measure, argued its relevance for this 
setting, and established the universal asymptotic optimality of the suggested schemes under this criterion. 

The schemes presented in this work can be practically implemented by identifying the problem of finding 
the minimizer in with optimization problems that can be solved efficiently. The implementation aspects, 
along with experimental results on real and simulated data that seem to be indicative of the potential of 
these schemes to do well in practice, arc presented in 
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Appendix 

A Technical Lemmas 

In this section several technical lemmas are presented that are needed for the proofs of the main results. 
Before continuing, we define Amax ~ niaxa^b A(a, b). 

The first lemma states that for any source and channel 11, Gk (^Q^''^^ [Z"-] , 11, is a very efficient estimate 
of Lf{X", Z"). In fact, it is uniformly efficient in all sources, channels, and sliding window functions /. 

Lemma 1 For all Fx e , U e Q{A), n>2k, f e Tu, and S > 



-P[Px,n] 



Gfc Q2'^+M^"],n,/ -%(A",Z") >^ <cxp[-nA(fc,5,A^ax,||n-i 



16 



where A{k, 5, Amux, ||n ^||) can be taken as any function satisfying 
,2,,, _/ 2SHn-2k) 



2(2fc + l)|yl|^^+^exp 



< exp [~nA Amax,||n-i| 



(24) 



(2fc+l)|^|4^+4(A^,,||n-i||)2, 
Remark: Wc shall assume below that the A chosen to satisfy (|24() is non-decreasing in S and non-increasing 

in||n-i||. 

Proof: 

We shall establish the Lemma by conditioning on the source sequence. Indeed, it will be enough to show 
that for aU Fx e S'^\ H e Q{A), f G Tk, S > 0, and all .t" G A" 



Pi 



-Px,n] 



> S 



2(2fc + 1)1^^1"'^+^ exp - 



2S^{n-2k) 



(2A;-t-l)|^|«+4(A^^,||n-i||)2 

Note that when conditioning on x" in H25|) . is a sequence of independent components, with Zi ^ 11(2;^, 
Now 



(25) 



/ ^ n— 

E El" 

n— A; 



n 



-T 



n - 2k 



i=k+l 



1 



2k 



i—k 



+1 ^Ifc.zfe^'' ^£-4 



E E E \^'^hz.=.\^,.^} TT, [A . /([zZL z, zf])] 



(26) 



where: 



Q'^^^^[Z"] z^^^^-i denotes the conditional distribution vector of Zq\Z_1, = z_l,Z^ = induced by 



• 2*+''} stands for the |^|-dimensional column vector whose a-tli component is zero unless 

^i^k = {zlZl,a, zl^l'i) in which case it is 1. 

On the other hand, 

i^k-\-l aeA 
11— k 

— - E [A 



n-2k 
1 



i=A;+l 
n — k 



n 



- 2k 



E E E [l{z--=(._-,.,.^),..=.} [A • KKl z, zf])] J , (27) 
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where 1, yi+k_,~i ^ ^ _ i denotes the ly^l-dhxiensional cohimn vector whose a-th component is zero unless 
both Zl'^^ = (zZl, z, Zi) and Xi = a in which case it is 1. From (|26|l . H27(l and the triangle inequality it 
follows that 



Gk (g2fe+i[z"],n,/) -L/(x",z")| 



^ n—k 



H-4|Aniax ^ ^ max 



6^*= 



1 



^—k 



n~2k 



(a) 



(28) 



Now, for all a;" € contexts z_l,, z^ G ^'^i and z £ ^ we have, 



^[Px,n] 



1— A; 



< 2(2A: + 1) exp 



:k+l 

2e^{n - 2k) 



n-2k ^ V t^I-fc=(2=-fc'^.^f).^t=a} 



> £ 



(29) 



(2fc + i)(||n-i||)2 

We get (|29ll by decomposing the summation inside the probability on the left side of (|29|l into 2fc + 1 
sums of approximately n/(2fc+ 1) independent random variables bounded in magnitude by ||n~"'^||, applying 
Hoeffding's inequality |1UI Th. 1] to each of the sums, and combining via a union bound to obtain (|29|) (cf. 
similar derivations in [S] and Combining (|28|l and (|29ll . with standard applications of the union bound, 

gives 



Pr 



[Px,n] 



> S 



X" = < 2(2fc+l)|^|''''+^cxp 



Amax|-4|2fc + 2 



(n- 2fc) 



(2fc + i)(||n-i| 



which, upon simplification of the expression in the exponent, is exactly (|25|l . 



Lemma 2 For all Pz, and Px e 5°°, H e Q{A) satisfying Px * H = Pz, 



Pz 



Gk (g2'-+i[Z"],n,/) - i?[p,,n] [L/(X",Z")|Z]| >5)< exp [-nB(fc, <5, A„,ax, ||n-i||)] (30) 



for all n > 2k, f G Tk, and 5 > 0, where B{k, 5, Amax, ||n ^||) can he taken as any function satisfying 
^-^ " " ■°^^ exp[-nA(fc,J,A,„,„||n-^||)] <exp[-nP(fc,^,A,„,„||n-i||)] . 



(31) 



Remark: Note that the random variables appearing in the probability on the left side of (|30|) are Z- 
measurable, and hence it suffices to consider the probability measure Pz, which is the noisy marginal of 
P[P^ Yi]- We shall assume below that the B chosen to satisfy (|31ll is non-increasing in ||n"^||. Finally, note 
that the combination of Lemma ^ and Lemma |2 implies that, for an arbitrary source Px and channel 11, 
^[Px,n] [i/(A:", Z")\Z] w P/(X", Z") with high P[p^,n] -probability. 
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Proof: 

Fix e > 0. By Lemmas 



Pi 



Gfe(Q2fc+i[Z"],n,/)-%(X",Z") 



> s 



< exp[-nA(fc,<5,A„,ax,||n-i||)] , 



implying, by Chebyshev's inequality, 



Now, the fact that 



Gk (Q2"+M^"],n,/) ><5|z) >e) < iexp[-nA(fc,5,A,„ax,||n-i||)] . (32) 

Gk (Q2fc+i[Z"],n,/) - L/(X",Z") < |^|2fc+2||n-i||A,„ax implies that 



Gk (Q2^+M^"],n,/) -i?[p,,n] [Lf{X",Z")\Z]\ < J + e|^p^-+2||n-i||A 



on the event 



{^[Px,n] ( \Gk (q^'^+M^"], n, /) - Lf{X", ><5| Z) < e} , 

in turn implying 

Pz(|Gfc(g2fc+i[z"],n,/) -i?[p,,n] [L/(X",Z")|Z]| >(5 + £|^|2'^-+2||n-i||A,„ax) < iexp[-nA(fc,(5,A„,ax,||n' 
when combined with H32|) . Choosing e such that (5 = e|.4p''+2||n^^||Amax, this implies 



>2^) < exp[-nA(fc,^,A.„ax,||n'^| 



from which an explicit form for the exponent function B in the right side of H3U|I can be obtained. 



□ 



The next lemma states that, with high probabihty, Gk [Q'^''+^ i^'"], H, fj estimates E[p^^n] [-^/(^", Z")|Z] 
uniformly well, simultaneously for all / G J-k and any finite number of pairs (Px,n) that give rise to Pz- 

Lemma 3 For all Pz E , finite JC C Coo{Px), 



Pz max sup 

XfeJ^k {(Px.n):neK;,Px*n=Pz} 



Gk (Q2^-+M^"],n,/) -i?[p,,n] [L/(X",Z")|Z]| >r/ + ^ 



< 



A„,ax(l + |^P"+'maxneyc||n-i| 



|/C| • exp 



-nS ( Aniax,max||n ^| 



(33) 



for all n > 2k and (5, r/ > 0. 



Proof: 

Lemma|21 the union bound, and the fact that (S, Amax, •) is non-increasing imply that for any / G J-k 



Pz sup 

\{(Px.n):ne/c,Px*n=Pz} 



Gk (g^'=+i[z"],n,/) -£;[p,,n] [L/(x",z")|z] 



> (5 



< |/C|exp 



-uB ( A,„ax,max||n ^| 



(34) 
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For e > let S{A, e) denote the subset of S{A) consisting of distributions that assign probabihties that are 
integer multiples of e to each a <E A. Letting = {/ : A^'^^^ S^{A)}, it is then straightforward from the 
definition of Gk and oi Lf that 



max sup 

/e^fc {(Px,n):nGK;,Px*n=Pz} 



< max sup 

f^^'i {(Px,n):ne/c,Px*n=Pz} 



+eA^ax l + |^P'^+'max||n-i| 



(35) 



Combining 1(231), and the fact that = \S{A,e] 

V = eA„>ax (1 + 1^1'^+' maxneK l|n-i||), 



11-4 r 



< 



e l-^l^*"^^ yields, for 



Pz max sup 

\feJ^k {(Px,n):ne/c,Px*n=Pz} 



> rj + S 



< |^f||/C|exp 



-nB A:,(5, A,nax,max||n ^| 



< e-l-^l''"^'|^|exp 



which is exactly H33|) since - 



-nB /c, 5, Aniax, max lln 



_ Amax(l + |.A|"''+^niaxngK \\n-^\\) 
~ I? 



Lemma 4 For aZZ Pz e 5°°, finite /C C Coo(Pz), 

Pz (max sup Gk (Q^'+'[Z%n, f) - F[p,,n] Z")|Z] 

\ {(Px,n):n6K;,Px*n=Pz} ^ ^ 



> ^ 



< |/C| • cxp 



-nT { k, S, Aniax, max | |n ^ | 



for all n > 2k and d > 0, where T can be any function satisfying 



2A,nax(l + |^r*+'maxne;c||n 



exp 



~nB /c,(5/2,A„iax,max||n ^1 
' neic 



< cxp 



-nr fc,(5, Amax,max||n 



Proof: 

The assertion follows from Lemma 13 upon assigning 5' = S/2, t] = S/2, and noting the decreasing mono- 
tonicity of B {k, S, Amax, with F chosen to be any function satisfying 



2A^ax (1 + \A\'"'+^ maxneK; ||n-i| 







exp 


-nB ^ 



neic 



< cxp 



-nF fc,(5, Amax,max||n 
neK. 

(36) 

□ 



Note that in Lemmas |2 |3| and 0] Pz is a completely arbitrary distribution, which need not even be 
stationary. 
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We now define A/(Q2'+1[Z"], A) = A n C/(Q2'+i [Z"])." The denoiser in Section EEl is defined as a 
function of A;((5^'+^[Z"], A), as opposed to A nCoo(-Pz) which would be ideal. Clearly this is not possible 
since Pz is not known. However we expect that A; will be close to A n Cco{Pz)- This is indeed the case, as 
quantified in Lemmas [S] and |51 below. 

Before we state our final three lemmas, we need to set up some notation. Denote by 5 C S°° , the set of 
stationary distribution in . Further, for a = a(n, I, e), let S{A, a) denote the set of all Pz E S for which 



Pz 



Pz' >£} <a{n,l,e) 



holds for all n,l,e. Note that by the Borel-Cantelli lemma, for a satisfying '^("' ^' ^) < °° ^o^' ' ^^'^ 
e > 0, S{A, a) is a subset of the stationary and ergodic sources. For any Pz G S and miccrtainty set A, let 



For a given a define now 



a, =a,(Pz,A) =p(AnCco(Pz),AnC/(Pz)) 



Un{k,l,T],S) = a{n,l,bi ^ (cj),^ ^ {S ~ (f>k{v)) - a/)), 



where (pk is defined in and bi is the function associated with Assumption 13 Let further 

Vnik, I, S) = Un{k, I, exp(- S). 
Lemma 5 For all Pz G 5 and A C Q{A) with maxngA ||n^^|| < oo, 

Pz (p (AnCoo(Pz), A, (q2'+1[Z"])) ><5) < Pz ([jp^, ^ - > b^^S ^ a,) 

where bi and ai were defined in 1^20^} and J5'7| ), respectively. 

Proof: 

We have 

Pz(p(AnCoo(Pz),A, (q2'+1[z"],a)) ><5) <PzfpfAnC,(Pz),A, (q^'+'[Z''], a)) > 5 - ai 

<Pz 



(37) 



(38) 



(39) 



p^. -O^'+M^"] 



>b7\S-ai) 



where the first inequality follows by the definition of a/, as defined in (|37|l . and the triangle inequality, and 
the second inequality follows from the definition of bi, as defined in (|20|1 . and the definition of Aj. 

□ 

Lemma 6 For any Pz G S and A satisfying Assumption^ the sequence a;(Pz, A) defined in j^T] ) satisfies 
ai{Pz,A) as I oo. Furthermore, the convergence is uniform in Pz- 
^^We may suppress A; dependence on A and Q^'+^fZ"]. 
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Proof: 

The first thing to estabhsh is that the relation 

Anc^iPz)-^ f]{AnCkiPz))= f]{neA:3Pxk^eS'"'+' s.t. P;^ * n = p^. J (40) 

k>l k>l 

holds for all stationary Pz- The direction C is true since obviously 

A n Coo(Pz) C |n G A : 3P^fc^ G 5^''+^ s.t. P^a.^ * n = Pzk^ I 

for every k. For the reverse direction note that if P' ^ = Pyk *^^n e ^zfe+i g^j^^-j P'k+i = Py^+i *~^n € 
^2fc+3 p' jg consistent with P'k+i , i-e. its 2A:+ 1-th order marginal. Thus, if 11 is in the intersection 
of the sets on the right side of H4U|) then {PLk }k>i is a consistent family of distributions so. by Kolmogorov's 

-^-k ~ 

extension theorem, there exists an unique stationary source P^ with the said distributions as its finite- 
dimensional marginals. Furthermore, P^ * 11 = Pz since P^t * H = P^^^ for each k. Thus we have 

A n Coo(Pz) 3 |n e A : ap^^^ e 5^''+^ s.t. p^k^ * n = p^k^ | , 

establishing (|^ . Now, the fact that {AnCfe(Pz)} is a decreasing sequence and that A n Coo(Pz) C 
A n Cfe(Pz) for all k implies existence of the limit lim/j^oo P (A n Coo(Pz)7 A n Ck{Pz))- Assume 

lim p(AnCoo(Pz),AnCfc(Pz)) >0. (41) 

k — *oo 



Let 



and define 



7 = lim p (A n Coo(Pz), A n Cfe(Pz)) > 



k^oo 



Cl{Pz) = \^e{AC^Ck{Pz)r s.t. inf ||7r-7r'||>J 

here we use the notation for the closure of set A. Be definition of 7, C^(Pz) 7^ for all k and since 
Cfc(Pz) C Cfc_i(Pz) then C;J(Pz) C C^_i(Pz) for aU fc. We also observe that C^(Pz) is closed for aU k. This 
last step follows from the fact that our norm 1 1 • 1 1 agrees with the given topology. By Assumption ^ and 
we have n^;^C^(Pz) C (AnCoo(Pz)) • Since {C;J'(Pz)} is a nested sequence of closed and bounded 
sets, the bounding comes from the fact that the set of all channels is itself a bounded set, there exists 
TT e n^iC^(Pz) C (A nCoo(Pz))" • This would mean that 

inf ||7r-n'||>^ 
n'GAnCoo(Pz) 2 

which is false since tt G (A n Coo(Pz))^ and || • || is a continuous function. Hence H41() is wrong and 

lim p(AnCoo(Pz),AnCfc(Pz)) = 0. 

k — -'Oo 

Therefore, for each Pz, lim/^00 a/(Pz, A) = 0. Since the set of distributions S is compact and from Assump- 
tion □ we know 

p(AnCoo(Pz),AnCfe(Pz)) 
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^univ — ^a''"'''" 1 (42) 



is continuous in Pz, Dini's Theorem implies the convergence is uniform in Pz- 

We can now state a generahzed version of Theorem 12 
Lemma 7 For any Pz E Sa, let 

where on the right side is the n-block denoiser defined in |0j and {kn\, {1,^} are unbounded increasing 
sequences satisfying fc„ < and '^^Vn[kmln,5) < oo for every S > 0. If Coc{Pz) H A 7^ and 

sequence {A„} with A„ C A and |A„| — O ('e^^ , then 

hm sup 

n — yoo 

Remarks: 



(Pz,A„,Z)~Aii^"^(Pz,A,Z) <0 Pz-a.s. (43) 



• The extreme detail of Lemma |7| makes it hard to extract any intuition from it. The main purpose of 
the lemma is to develop the subsequent Theorem |21 and Proposition ^ 

• Note that the stipulation in the statement of the theorem that Coo{Pz) n A ^ is not restrictive since 
the real channel is known to lie in A. 

• To avoid introducing additional notation, henceforth denotes the denoiser defined in H42|l . rather 
than that of Theorem ^ 

• It should be emphasized that the sequence {A„} is not related to the construction of the denoiser. 
Rather, A„ is simply the subset of A on which performance is evaluated for the n-block denoiser (cf. 
(133 )• Note that since the size of A„ is allowed to grow quite rapidly, one can choose a sequence {A„} 
for which p(A,i, A) ^ quickly. 

Proof: 

We start by outlining the proof idea. Two ingredients that were absent in the setting of Theorem ^ and that 
now need to be accommodated arc the fact that A is not necessarily finite, and that A need not be a subset 
of CooiPz)- The first ingredient is accommodated by evaluating performance, for each 71, on a finite subset 
of A, A„. For the second ingredient noted, a good thing to do would have been to employ the denoiser X^^!' 
taking A' = A n Coo{Pz)- Instead, the denoiser we construct in the present theorem is X^'^'K Lemmas El 
andiniensure that for large enough I, A/ is "close" to AnCoo(-Pz) which, in turn, implies that the performance 
of the scheme that uses A/ is essentially as good as one which would be based on AnCoo(-Pz)- The bounds in 
the lemmas, when combined with the additional stipulation of Lemma that Pz G S(A, a) provide growth 
rates for k and I which guarantee that under the p metric. A; ^Q^'"''^ [^"]^ AnCaoiPz) rapidly enough to 
ensure that the performance of x^'*""''" converges to the performance of ^2nc (Pz)' should be noted that 
the only point where the stationarity and mixing conditions, on the noise-corrupted source are used is for 
the estimation of AnCoo(f'z)- For a completely arbitrary Pz, not necessarily stationary, if AnCoo(-Pz) were 
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given then the scheme of Theorem ^ where A n Coo{Pz) is used for A, could be used, and the performance 
guarantees of Theorem ^ would apply. In the remainder of this subsection we give the rigorous proof of 
Lemma {7\ 

LemmaEland the fact that Pz € S{A, a) imply (recall H20() and (|37|l for definitions of bi and ai) 

Pz(p(AnCoo(Pz),Az(Q2'+M^"],A)) ><5) < a{nJ,bY\S - ai)). (44) 
Combined with ^ this implies 

Pz (max I J, (Q2fc+i[z"],AnCoo(Pz),/) - J;, (g^^+ifz"], A, (^Q'-'+'[Z^^i a) 



> S 



< Pz(p(^AnC^(Pz),A,(^Q^'+i[Z"],Ajj >0fci(<5) 



(45) 



Let now A[^] denote an r^-cover of A. Note that for all sample paths, by ^ and the fact that A„ C A 
implies p(A„ U A[^], A) < ry. 



max 



Jk 0'"+' [^"] , ( A„ U A[„] )nCooiPz)J)-Jk[ Q''+' [^"] , A n Coo (Pz ) , / < ^fe iv) • (46) 



The combination of (|46|l with (|45(l now implies 



Pz max 



Jk (Q2fc+M^"],(A„U A[,])nCoo(Pz),/) -Jfe [q'^+'IZ'^IA, (q2'+1[z"],a) ,/ 



>5] < Unik,l,TJ,S), 

(47) 



where Un was defined in ()38|) . Now, from the definition of X^^ ' , it follows that 



sup P[P^,n] 
{(Px,n):neA„uA[,],Px*n=Pz} 



L^„,.,,(X",Z")|Z 



' o ^ , ^[^^X^n] [Q2fc+l[2„],A,(Q2i + l[2,.],A)](^">^")l^ 

On the other hand, for every f J^k, 



sup 

{(Px,n):nGA„UA[, 



(48) 



{(Px,n):nGA 



sup Gfe(Q2fe+l[^n]^n,/ 

„uA[„],Px*n=Pz} ^ 



max Gfc (02^+i[Z"],n,/ 

ne(A„uA[^])nCoe(Pz) ^ 



= J, (Q^^+MZ"],(A„UA[„])nCoo(Pz),/) (49) 



implying, when combined with Lemma 2) that 



Pz max 



Jk O'^^+IZ"], (A„ U A[,]) nCoo(Pz), / 



sup P[p,,n][L/(X",Z")|Z] 
{(Px,n):neA„uA[„j,Px*n=Pz} 



< (|A„| + |A[„]|) -cxp 



-nV I fc, (5, Aniax, max | |n 
riGA 



(50) 



When (|50|l is combined with l|47|l as well as a union bound and a triangle inequality, we get 



Pz max 



(02'^+i[Z"],A, (02'+1[Z"],a) ,/) - sup i?[p,,n][P/(X",Z")|Z] 

^ ^ ^ ^ {(Px,n):nGA„uA[^],Px*n=Pz} 



< 



Un{k, 1, 7j, S/2) + (I A„| + I A[^] I) • exp 



-7ir ( /c,(5/2,Amax,niax||n ^||) 

\ HgA J 



(51) 
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Since by the definition of /m 



min JWq2'=+1[Z"],A, (Q^'+^[Z"],A) ,/ 



\2l+l \ym 



it follows that 



mm 



in Jfc (q2'^+M^"],A, (q2'+1[z"],a) ,/) - sup E^p^ 

\ V / / {(Px,n):n6A„uA[^j,Px*n=Pz} 



n] 



= Pz 



sup E\ 
{(Px,n):neA„uA[^j,Px*n=Pz} 



[^^x,n] 



(52) 

A 



> 



< C/„(fc,;,77,5/2) + (|A„| + |A[^]|) -exp 



-nr (fc,<5/2,A„ax,max||n-i| 



where /mm. = /mm. [g^'^'+M^"], A; (q2;+1[z"],A 
is due to (|51|l . On the other hand, 



, the equality is due to H52|l and H48|) , and the inequality 



Pz 



min Jk (Q^"+M^"],(AnU A[,,])nCoo(Pz),/) -Mi"^(Pz,A„UA[„],Z) 



> s 



< (|A„| + |A[^]|) -exp 



-nV I k, S, A,„ax, max | |n~ 



(54) 



implying, when combined with (|47|l and H53|l as well as a union bound and the triangle inequality, 

Pz ( I {Pz , A„ U A[^] , Z) - £ ^„^^^ (Pz , A„ U A[^] , Z) I > 

Pz I (Pz, A„ U A[„] , Z) - sup £;[Px,n] 

{(Px,n):neA„uA[^,,Px*n=Pz} 



L/j^n.kA (^^, Z^^^ \ 'Zl 



> 5 



—nV [ fc, (5/3, Aniax, max ||n 



< (|A„| + |A[„]|) -exp 

+C/„(fc,/,r;,5/6) + (|A„| + |A[„]|) - exp 



+ C/„(fc,;, 77,^3) 



-nr ( fc, (5/6, Amax, max | |n" 



(55) 



Choosing now k = fc„, ^ = Z„, ?y = r/„ = exp(— -^/n/lylp) and noting that A[,j] can be chosen such that 
|A[^]| < 77~l'^l leads to the bound on the right side of (|55|1 : 



(5/6) + 2 (|A„| + exp(-yn)) • exp 



^nr ( fc„,(5/6,Amax,max||n || 



(56) 



which is readily verified to be summable for all (5 > under the stipulated assumption on the growth rate of 

^univ A 



kn and Z„.^^ Since Xuniv — A"^''^"''" we obtain, by the Borel-Cantelli lemma. 



lim 

n — ^oo 



Mi"'(Pz, A„ U A[^„], Z) - £^„^^^(Pz, A„ U A[^„], Z)J = 



Pz - a.s. 



(57) 



^■^The growth rate of kn stipulated in the theorem guarantees that exp [— nF (fcn, <5/6, Amax, maxngA ||n '^H)] < 
exp(— n^/^+'^) for an e > and all sufficiently large n. The factor multiplying this exponent (|A„| +exp(y'?i)) is upper 
bounded by 0(exp(-y^)). Combined with the stipulated summability of Vn{kn,in,S) this guarantees the summability of the 
expression m 
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Thus we obtain Pz-as. 



limsup , (Pz,A„,Z) -Mi"^(/'z,A,Z) 



< lim sup 



0, 



C^„^^^^_ (Pz, A„ U A[^„] , Z) - fi^^'J (Pz, A„ U A[^„] , Z) 



where the inequahty is due to the facts that A„ C A„ U A[,j^j C A and that both Ci^„ (Pz,-,Z) and 
//["'(Pz, •, Z) arc increasing, and the equahty follows from H57|l . 

□ 

B Proof of Theorem [T] 

We start with an outline of the proof idea. The assumption that A C C(Pz) is finite, combined with Lemma 
Oland the definition of Jk (recall (0), imply that, for fixed k and large n, Jk A, /j is uniformly 

a good estimate of £^"^(Pz, A, Z) = sup{(p^ n):n6A,Px*n=Pz} -^[Px.n] [Lf{X", . Thus, the perfor- 

mance of the sliding window denoiser / that minimizes Gk (q^'^'+M^"], A, /) is "close" to min/e^, dp\Pz,A, Z) 
/^^"^(Pz, A, Z). The bounds in the lemmas of the preceding subsection allow us not only to make this line 
of argumentation precise, but also to find a rate at which k can be increased with n, while maintaining the 
virtue of the conclusion. In the remainder of this subsection we give the rigorous proof. 

For any pair (Px, H) such that 11 G A and Px * 11 = Pz, it follows from the definition of X^'*^ that 



El 



Pj.„,fc(X",Z")|Z 



L 



i[Z"],A](^"'^")l^ 



(58) 



and, therefore, 



max -E'fPx.ni 
{(Px,n):neA,Px*n=Pz} ' ^ ' 



P^„..(X",Z")|Z 



max ^fPx.ni 
{(Px,n):neA,Px*n=Pz} ' ^' ' 



"/mm, [Q2'=+1[2"].A](-^"'^")|2 

(59) 



On the other hand, the fact that A C C(Pz) implies that for every / e Tk 



max Gk Q^^-+M^"],n,/ = maxG,. g^^-+^[Z"],n, / = Jk{Q'^+'[Z%AJ) (60) 

{(Px,n):nGA,Px*n=Pz} V / nsA * ' ^ ' 



implying, when combined with Lemma 01 that 
Pzfmax Jfe(Q2fe+i[Z"],A,/) - 



max £;[p^.n][L/(X",Z")|Z] 
{(Px,n):neA,Px*n=Pz} ^ ^ ' ■< ^ 



> S 



~nT { fc,(5,A„iax,max||n ^1 



(61) 



< |A| • exp 

Since, by the definition of /mm,, Jk (g^fe+i [z"]. A, /mmJQ'^+M^"], A] j = min^e^, Jk {Q^''+^[Z% Ajj , 
it follows that 



Pz 
Pz 



mm 



^^(^""[^"]'^'^)-,(Px.n; 



max -E'[Px,n] 
:neA,Px*n=Pz} ^ ' 



L_^„,.(X",Z")|Z 



> S 



Jk Q^*+^[Z"], A, /mm. [Q^*+1Z"], A] 



{(Px,n):neA,Px*n=Pz} 



L 



/mm, [Q2'= + i[2"],A](-'^"'^")|2^ 



< |A| • exp 



-nF ( fc,(5, Aniax,niax||n ^1 



nsA 



26 



where the equality follows from (|59|l and the inequality from l|61() . Furthermore, another application of (|61() 
yields 



Pz 



mill J, ( g^'=+^[Z"], A, / ) - M^"^(Pz, A, Z) 



> 6 



mm 



in Jfe(Q2fc+i[^nj^^^^ 



min max E,p^ m \Lf(X'' , Z'^MZ] 

/e^fc {(Px,n):neA,Px*n=Pz} ^ ^- ' ' 



>S 



< |A| • exp 



-nV { k,S, Aniax, max | |n ^ | 



(63) 



which when combined with H62() . as well as the triangle inequality and a union bound, implies 



Pz 

< 2|A|-exp 



Mi"'(Pz,A,Z)- 



max 

{(Px,n):neA,Px*n=Pz} 



Px.n] 



> S 



-nr ( fc, -, Ai„ax,max||n ^| 



(64) 



Now, the bound on the growth of k„ stipulated in the statement of the theorem is readily verified to guarantee 
that for every 6 > 0, I]„exp [-nV (fc„, |, A,nax, maxneA ||n~^||)] < oo.^^ Recalling that X"„i^ = X^'*'"' 
this implies via (|64|l and the Borel-Cantelli lemma that 



lim 



/.i:^(Pz,A,z) 



sup ^[Px,n] 
{(Px,n):neA,Px*n=Pz} 



(X'\Z")|Z 



= Pz - a.s. 



From the notation defined in (|14|l . we see this is exactly (|17|l . 



□ 



C Proof of Corollary H 

The proof follows the same lines as the proof of Proposition ^ without the added complexity of an infinite 
A and having to estimate of A n C;(Q^'^^)- Hence we will omit the proof of Corollary [5| 

D Proof of Theorem [21 

The main idea is to show that the -(/j-mixing condition of Theorem [51 implies the conditions on a needed in 
Lemma[71 Once this is shown, it only remains to appeal to Lemma[3to conclude the proof. To demonstrate 
that the ^/'-mixing condition implies the conditions on a, we break the n-block into sub-blocks which are 
separated by uniform gaps. By controlling the rate at which both the sub-blocks and gaps grow with n, we 
can guarantee that the content in the gaps essentially does not effect the empirical distribution, while letting 
these gaps grow with n. We then use the i/j-mixing condition and the fact that the gap size is growing with 
n to drive the joint distribution of the sub-blocks to that of the distribution of independent sub-blocks. This 
then allows us to uniformly bound the rate of convergence of the empirical distribution to that of the true 
distribution, which is exactly what is needed for a bound on a. We can then apply Lemma [3 

'^^Thc stipulated growth condition is readily seen to imply for any e > exp [— nA (fc, <5, Amax, ||n~-^||)] < exp(— c^n''/*"'^), 
exp [— n_B(fc, 5, Amax, lln"-""!!)] < cxp{— Cf n^/^^^/^) and, consequently, exp [— riF (fc, 5, Amax, maxngK ||n~"'"||)] < 
exp(— c^n-'^''^) (recall I24i . 1311 and 1361 for definitions of these quantities). 
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Fixing I and e > we begin by showing bounds on 



> 



Using the union bound wc have 



a;2i + lg^2l + l 



> 



(65) 



For each e A^^+^ 



Q2'+1[z"](x2'+1) = ±y y,(a;2'+i), 



where Yi{x'^'''^^) is the indicator function on the event ^1(^2/*+ ij+V'' ^ x'^'''^^ and n; = [7t,/(2Z + 1)J. For the 
sake of notational simphcity, we wiU fix E ^^'+1 and use Yi for li(a;^'+^). Since Z is -0-mixing with 

coefficients {ipi}, then Y is -i/j-mixing with coefficients V'i < V'i-2i-i for aU i > 21 + 1. 
We now define 5„, = X]"=i Therefore we have 



> e 



„2t + l\ 



We can further decompose this as 

Pz {\Pziy+') - Q^'+'[Z-]{x'^+')\ > e) < Pz > ni {p^.J 
In order to make use of the Chernoff bound, we rewrite the above as 

Pz (|PzL,(^''+') - 0^'+M^"](^''+')| > e) <^z (5„, > m (Pz._y'+^)+e 

+ Pz (n, - Sn, > ni (l - Pzi_^ {x'^+^) + e 

Using the Chernoff bound we have 



Pz (|p^,^(a;2'+i) -Q2i+i[^«](^2Z+i) 



-nit{p+e 



(66) 



where p = P^i ^ (2;^'+^), p = 1 — p and t > Q. Choose r > 21 + \ and to e N large enough such that 
1 + = l + ^r-2i-i < min{ei/2^(P+^/2||p),ei/2^(P+^/2||p)| 

where P'(p + e\\p) is the KuUback Leibler distance between BernouUi(p + e) and Bernoulh(|)) distributions, 
and m > 2{r + l)/e. 

We now turn our attention to bounding S'„, . Letting 

7V„, = max{iV e N : ni > Nlm + r)} 



we have 



Sni — ^ Yi 

i=l 

j=0 j=l I i=N^I„i+r) + l 



(a) 

< iV„,(r + l)+ ^ ^y,w„, 

j=0 i=l 



(67) 
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where (a) comes from the fact that Yi S {0, 1} and the definition of A'^„, . Similarly we can derive the bound 

Sn, > E^^-^+- (68) 

Combining H66() . H67() and (|68|l we have 

[ ^) (69) 

r 

gt7V„,(r+l)g-n,t(p+e) ^ ^ 



< E 



n 

J=0 



-nit(p+e) 



Since Y is ^/i-mixing, we know that the Radon-Nykodim derivative of (Yi, . . . , Y„i) and {Y^a+n ■ ■ ■ i ^2m+r) 
with respect to the product of the marginals is less than or equal to 1 + V'r- Hence (|69|) gives us 



> e 



By our choice of r and m we get 



„tAr„,(r+l) -n,t(p+e) 



> e 



< 



Nr 



' g-t'Ar„^m(p+e/2)g^£i(p+e/2||p) 



.(70) 



We also know that £'[e*'^"'] subject to the constraint that E[S„i] ~ rap and S„i G [0,m] is maximized when 
Sm is m with probability p and with probability p. Hence 



Similarly 



£^[6*^-] < pe^P'+p. 



(71) 



(72) 



Combining (jTUJ), (ETJ and lO we get 

Pz {\Pz\{x^'^^) - Q''+M^"](2;''+') 



> £ 



< 



AT,, iV„ 



N„ N„ 

p^rnpt' ^-t'm(p+e/2) ' D(p+e /2\\p) 



Since the above equation is true for all t^t' > we can take the infimum over all t' > and get 



Pz 



Pz^ (x2'+i)-Q^'+^[Z"](x^'+^) 



2l + lr Yn]f2l+l^ 



> e 



(73) 



< 



.^D{p+e/2\\p) 



inf (pe^f * + p) e-*™(P+^/2) 

t>0 ^ ^ 



A. P^D{p+e /2\\p) 



inf fpe"^'*' +p1 e-*''"(P+^/2) 
t'>o V / 



Since _D(p + e\\p) is the rate function for a Bernoulli(p) process, it follows that the infinmm in H73|) yields 



Pz 



p^, (x2' + l)-g2/+l[^"](^2/+l^ 



> < e — ?-^(p+s/2||p) +e — ?-^(p+^/2||p). 



(74) 
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We can now further upper bound by taking the maximum over p. Letting 

p[{a) — s.rg min D(p + a\\p), 
pe[o,i] 

further bounding of l|74|) yields 

Since (jTSJ is true for all ai^'+i e ^4,^'+^ then (jTsl, and the definition of 7V„, yield 



( Pzi -Q2'+H^"] > e) < 2|^|"+'e~H (..+.r(2,+i) -2)g(p'(e/2)+E/2||p'(s/2))_ 



Further upper bounding £» (p*(e/2) + e/2||p*(e/2)) by D(l/2 + e||l/2) < log(l + 2e) we obtain 



- Q2'+i[zn]|| >e) <2(l + 2e)|^|''+^e-^'M-Ti)^(p*(-/2)+-/2lb^ 

the bound in 1)77(1 being valid for all n (since if n < (m + r)(2/ + 1) the bound is greater than 1) 
without loss of generality assume e < 1. Hence Pz £ S{A, a^) where a^, is defined by 

with ?■ > 2/ + 1 and m G N chosen such that 

1 + = 1 + < ^l/2D(p'{e/2)+e/2\\v'{e/2))^ 

and m > 2(r + 

We now turn to bounding Vn as defined in 1)39(1 . We first define the following 



^(1) ^ |^|2fe+iA,„,,max||n-i| 



(2) 



max||n-i| 



-1-41' 



For 5 > 0, we can now expand Vn as follows 



Vn{k,l,5) = a^(n,/,6r'(C'('5-0fe(??))-aO) 



(2) 



For a given sequence {/n}, choose 



/c,, < min 



Inn 



In a. 



16 In 1^1' 4 In 1^1 
This restriction on fc„ assures us that there exits N' such that 

fj{2) 

-^5 - d'^\ - >0 Vn>iV'. 
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We now choose 

<?n ^ niax{4n^/^, — Ina;^^, l„} 

and define 

Notice that e„ is monotonically decreasing to and that e„ is independent of S. Also, by our choice of 
we are assured that there exists N" such that 

-^(5 - C;^f^?7 - C^l^ai^ > Sn Vn > N" . (80) 
Combining the monotonicity of a^{n,l,e) in e and H8U|I gives the following: If 

oo 

^a^(n,l„,e„) < oo, (81) 

then 

oo 

5]K(Z«,;n,5) < oo V(5>0. 

i=l 

We now construct an unbounded sequence {tUnlJ^i- For n small, Wn can be chosen arbitrary. For n 
large, let Wn be defined such that 

(2u-„ + 1) In 1^1 - -C{wn,{^P,}Zi) < -2 (82) 

where 

r (,„ uh ^ - (£,.„/2) + £^J2||p* (£^„/,)) 

with m-uj„, 7'uj„ G {1, 2, . . . , n} chosen such that r.u,^ > 2?i;„ + 1, 

l + ^^^^_2«,„-i < ei/2^(P*(^-/2)+--/2||P*(-''-/2)), (83) 

and 

2(?\«„ + 1) 
> . 

Notice that both {2wn + l)ln|^| and C{wn, {ipij'^i) are decreasing in w„. Fmthcrmore, their dependence 
on n comes only through the sequence {wn}. Hence combining the fact that V-'r ^ and by allowing Wn to 
grow slowly with respect to n, we can insure that inequality (|82|) holds. 

Expanding a^(n, /„,e„), we see that (|81ll holds whenever {/„} and {fc„} are unbounded sequences such 
that 

In < Wn (84) 

and 

In n In a; 



k„ < min 



161n|yt|' 41n|yt| 

Note, since {wn} is unbounded and from Lemma|H|we know that a/ 0, we can choose {/„} and {fc„} to 
be unbounded. Recall that a; is used to denote a;(Pz, A) and is a function of the distribution Pz- Hence 
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although the constraint on {In} is independent of Pz, the constraint on {fc„} is not. However, from Lemma 
wc know that ai{Pz, A) ^ uniformly in Pz- Uniform convergence implies 

lim sup ai{Pz-,A.) — 0. 



I — ^oo 



Pzes 



We can therefore choose {fc„} independent of {ai{Pz, A)} and hence independent of Pz. In particular, we 
can choose {k„} unbounded and satisfying 



k„ < min ■ 



Inn 



lnaz„(Pz,A) 



16 In 1^1' 4 In 1^1 



(85) 



Theorem 12 now follows by applying Lemma for any unbounded sequences {fc„} and {?„} satisfying (|84f) 
and 

□ 

E Proof of Proposition [T] 

The idea of the proof that follows is to combine Lemma Lemma and the triangle inequality to get a 
bound on the terms of the limit in (|23|l . and then to use Lemma [7| to show that the boimd vanishes in the 
limit. 

Before going through the proof, we note that by the same argument as in the proof of Theorem El we 
can construct sequences {A:„} and {In} such that for all Pz G S(A,a^), 



{ n,kn, 

\ maxneA ||n-i||A,nax 1^1 

C50 

< ^ (rt, fc„,e„) < (X) Ve > 0, 



6fc„+3 



(86) 
(87) 



where £„ and are defined as in the proof of Theorem [3 
Lemma [7| gives us 



lim 

n — *oo 



£^.„_(Pz,A,Z)-Mi:'(Pz,A,Z) 







Pz - a.s. 



Letting Ez stand for expectation under Pz and taking expectation in the above equality, it follows from the 
bounded convergence theorem that 

"/.i^)(Pz,A,Z)" 



lim 

n — »oo 



E7 



. (Pz,A,Z) 



0. 



Expanding the inner terms gives 



lim 

n — ^oo 



E7 



max £'[Px,n] 
(Px,n) ^ ^- ' 



L^„ , (X",Z")|Z 



E7 



min max £:rp^ ni [if(^",^")|Z]) 



0, 



where for notational simplicity, we suppress the constraints on (Px , H) in the maximization. Moving the 
expectations in we get 



lim sup 



m^ax i?[p,^n] 



min Ez 



max i?[p,,n] [i/(X", Z")|Z]) 

(Pk,!!) 



< 0. 
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Defining 



Bn = niin max Eip^ u]\Lf(X",Z^'')] — min Ez 

/6^fc„ (Px,n) L X, J L J V ^^^^^ 



max i?[p,,n] [%(X",Z")|Z] 

(Px.ll) 



max E[p^,u] 
(Px,n) ^ ^ 



L^,. - min max Eip^ m [L fiX'^ , Z'^)] + B„ 

J /ej^fc„ (Px^n) i ^' ■'^ 



gives us 

lim sup 

n — 'oo 

For notational convenience denote 

.gPx,n,/(Z) ^ i?[P,,n] [L f (X^ , Z-)\Z] 

Let (5 > 

Ez [5Px,n,/(Z)] - G,,. (Q2fe"+i[Z"],n,/ 



< 0. 



(89) 



< 



E[p^^u] [Lf{X^\Z^^)] - Gfe„ {Q''"+'[Z%nj 
Lf{X^,Z'')-Gu^ (Q2fe„+i[^„] n,/ 



Px^n] 



■^[Px.n] 



< E, 



[Px,n] 



%(X'\ Z") - Gk„ (Q2fe"+i[z"],n, / 



^[Px.n] fG,„ (Q2fe.+i[z"],n,/)l -G,„ (Q2fc.+i[z"],n,/ 



i?z [.9Px.n,/(Z)] - G,„ (Q2fc.+i[^"],n,/ 



(a) 

< A, 



(90) 



^[Px,n] [Gfe„ (Q2^-"+i[Z"],n,/)] -G,„ (Q2fe.+i[z"],n,/ 



where (a) follows from lemma Since H9Q(I holds for all (Px,n, /) we have 
Ez [ffPx,n,/(Z)] - G,„ (Q2fe.+i[z"],n, 



max max 
/e^A,.„ (Px,n) 

A,„axe""^^''"'*'^°=""""^''"^^ll""'ll^ +5 + max max 

/Gjp-fc„ (Px,n) 



To proceed, we establish the following. 
Claim 1 



< 



E, 



Px.ni 



(91) 



limsup max max 

„^oo /e^fc„ (Px,n) 



Px,n] 



Pz - a.s. 



Proof of Claim Q 

The definition of Gk is readily seen to imply 



E, 



Px,n] 



-Gfc 



(0 



max||n ^IIAmaxl^l 



6fc„+3 



Ez 

111 A I i|6fc„+3 



max||n-^||A„ax|^| 





1,n,/) 






_Q2fe„ 





< 



33 



By the construction of a^, for any e > we have 



Pz 



maxneA||n-i||Amax|^| 
Since by hypothesis the right hand side is summablc, the Borel-CanteUi lemma imphes 



6fc„+3 



Hm 



Px,n] 



Gfe„ ( o^^-'+i [z"] , n, / ) - Gfe„ ( Q'-^*" +1 [z"] , n, / 



= Pz - a.s. 



Note that for each n, 



-El 



[J'x.n] 



is continuous in /. that / e J^^^^ C J-qo, and by Tychonoff's Theorem J-oo is compact. We can therefore 
apply Dini's Theorem which implies that the limit is imiform in /. Due to the finiteness of |A|, uniform 
convergence in / implies the convergence is uniform in (_Px,n, /), thus establishing the Claim. □ 
Returning to the proof, the combination of Claim and H91|l gives 



limsup max max 

n^oo /6-7^fc„ (Px.n 



Ez [,9Px,n,/(Z)] - Gfc„ (Q2'^"+MZ"],n,/ 
limsup An.axe-"^^'^'-'^^'^"''-'"'''^"^^ +S Pz- a.s 



< 



Since fc„ is chosen such that fc„ < y|^^, e-"^(''"-'^'^»--''°'^'^neA ||n i||) _^ q (^^^^^^1 lemmadfor what A can 
be chosen to be) and therefore 



limsup max max 

„_^oo f&y^k„ (Px^n) 



Ez [5Px,n,/(Z)] - Gfc„ (g2'^"+i[Z"],n,/ 



implying 



lim max max 

n^oo /e.Ffc„ (Px,n) 



Ez [5Px,n,/(Z)] - G,„ (02^"+i[^"],n,/ 
by the arbitrariness of S. We also note that 



< S Pz — a.s. 



= Pz - a.s. 



min max £;z [gp^ n f(Z)l - min max Gfe f Q^'^^+HZ"!, H, / 
/6.Ffc„ (Px,n) '■' fe:F^„ (Px,n) "V ' ' ■ 



< 



max max 
/G^fc„ (Px,n) 



Ez [5Px,n,/(Z)] - Gfe„ (02^"+i[Z"],n,/ 



and therefore 



lim 



in max £;z [5Px n f(Z)] - min max f Q2''"+HZ"], H, / 
p-fc„(Px,n) /e^fc,. (Px,n) "V 

From lemma El we have 

Pz (|Gfe„ (02'="+MZ"],n,/) -5Px,n,/(Z)| >S)< e-^C^-.^^A^.iin-ii) 



= Pz - a.s. 



(92) 



Therefore, applying the union bound, we obtain 
Pz I max 



(Px,n) 
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Since 



max 



we have 

Hence 
Ez 



> 



max Gk„ (Q^'"+'[Z%nj) ~ max .9Px,n./(Z; 
(Px.n) "V I (Px.n) 



max 



max 

(i^x 



^7 



max gp^jiji^) 
(Px,n) 



max gPx,n,/(Z) 

(-^x,ll) 



Since this is true for aU / e JFfc^ we have 



max 
Since 



E-7 



max 



Ez 



max .gPx,n,/(Z) 
(Px,n)'^ X. ,jv y 



< A,„ax|A|e""-^(''"'''''^'"'""™''''"^^ll" '11^(5. 



max 


Ez 


max Gfe ( 






.(Px,n) " V 



min Ez 



max 
(^'x.n) 



we also have 



min Ez 



max Gfe„ (02fe"+i[z"],n,/ 
(J'x.n) 



min Ez 



min i?z 



max gPx,n,/(Z) 

(Px,n) 

max gPx,n./(Z) 

(Px,n) 



max gPx,n,/(Z) 

(.^x.iij 



> 



< 



A„iax|A|e""-^('="''^''^"="=^'"^''°^^ll" 'II' +(5 



and therefore 

Um sup 



min Ez 



max Gfe„ (02'^"+i[^"],n,/ 
(fx,n) V 

hmsup AniaxlAle-''^^*-"'*'-^— "^^''"'^-^ lin'll' + ,5 



min Ez 



,max gPx,n,/(Z) 



< 



1 sup 1 vniax I 
n — >oo 



Since A:„ is chosen such that A:„ < j^T^^p^j , lemma|21imphes that i? can be chosen such that e "^('^"■''^^max.maxneA l|n 
and therefore 

hm sup 

n — >oo 

implying, by the arbitrariness of (5 > 0, 



min Ez 


max Gfe ( 




.(Px,n) V 



— min Ez 



max ,9Px,n./(Z) 
(Px.n) 



lim 


min Ez 


max Gfe ( 


n — >oo 




.(Px,n) V 



- min Ez 

Before completing the proof, we shall need to establish the following. 
Claim 2 

Gfe„ (Q2fe"+i[Z"],n,/ 



max 9Px.n,/(Z) 

(Px.ll) 



0. (93) 



lim 

n— >cx3 



min i?z 



max 
.(Px,n) 



min max Gfe ( Q^'="+Mz"l, H, / 
/e.?-fc„ (Px,n) " 



Pz - a.s. 
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Proof of Claim \^ 
Since 



mill Ez 



max 



max Gfc„ (Q2fe.+i[^"],n,/ 



Ez 



— mm 
feJ^k„ (Px 



max Gfe„ (Q2'=n+i[z"],n,/ 
Px.n) V 



< 



max 

(Px, 



X Gfe„ (Q2''^"+i[^"],n,/)l - max Gfe„ (q^''-+'[Z%I1, f 
n) V y (Px,n) V 



it is sufficient to show 
Ez 



lim max 



max 
(Px,n) 



- max Gk„ {Q''"+'[Z%n,f 
(Px.n) " ^ 



= Pz - a.s.. 



The definition of Gfc, via an elementary continuity argument, is readily verified to imply 



Ez 



max Gfc„ (Q2fc.+i[^"],n,/ 
(Px,n) \ 



max Gfe„ (Q2fe.+i[^"],n,/ 
(Px,n) \ 



< 



max||n-i||A„iax|-4| 



6fc„+3 



E, 



By the construction of a^, for any e > we have 





Ez 


max Gfc,, ( 






.(Px.n) V 



max 
(Px,n) 



>e < 



n, fc„, ■ 



y maxn6A||n-i||A„ax|^|''"+'V ■ 
Since by hypothesis the right hand side is summable, by the Borel-Cantelli lemma 



(94) 



Pz I lim sup 



Ey 



max Gfc„ (Q^'^"+^[Z"],n,/ 
(Px,n) 



inax Gfc„ (Q''-+'[Z%UJ 
(Px,n) ^ 



> e = 0. 



Since e is arbitrary, we can take e ^ and get 



lim 


Ez 


max Gfe„ ( 


n — >oo 




.(Px.n) V 



max 
(Px,n) 



Pz- a.s. 



The proof is now completed similarly to the proof of Claim □ 

Equipped with Claim |21 we now complete the proof of Proposition ^ as follows. We have 
lim sup \Bn\ < lim sup 

n — >oo n — >oo 

+ lim sup 

n — ^oo 

+ lim sup 



min max Ezlgp^n f(Z)]- min max Gfc g^''"+MZ"l, H, / 
/G.Ffc„ (Px,n) /e.F,„ (Px.n) "V 



min Ez 



mill Ez 



max Gfc„ (Q2fen+i[z"],n,/ 
(Px,n) V 

max Gfc„ fQ2fc.+i[^"],n,/ 
(Px,n) V 



min Ez 



max 5Px.n,/(Z) 
(Px,n) 



- min max Gfc g^'="+MZ"l, H, / 
/e.?-fc„ (Px.n) " ^ 



From (ins, (ESI), Claim El and the fact that |B„| > it follows that 

lim |B„| =0 Pz- a.s. 

n — ^oo 

Combined with H95|l and (|89|l this gives 

limsup max^[p^^n] [^x- . - min max S[p^,n] [i/e^,„ (X", Z")] 

n^oo [{Px,n) L uni" J (Px.n) 



< 



(95) 



(96) 
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On the other hand, since X^niv S ^k, 




When combined with H9()|) . we get the desired result 





□ 
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